In this exercise, we shall apply the skills we had learned in last two lessons to create data visualisation by using ggplot2.
Problem Statement: In this Take-Home Exercise, we would like to apply appropriate interactivity and animation methods to design an age-sex pyramid based data visualisation to show the changes of demographic structure of Singapore by age cohort and gender between 2000-2020 at planning area level.
The two datasets provided are Singapore Residents by Planning Area / Subzone, Age Group, Sex and Type of Dwelling, June 2000-2010 and Singapore Residents by Planning Area / Subzone, Age Group, Sex and Type of Dwelling, June 2011-2020, which are downloaded from the Department of Statistics webpage.
Before we get started, we will need to install the relevant R packages.
packages = c('ggpubr', 'tidyverse', 'readxl', 'knitr', 'gganimate', 'gapminder', 'gifski', 'manipulate', 'kableExtra','plotly')
for(p in packages){library
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
First, let us import the data into R by using the read.csv function. Note that there are two datasets.
In this particular problem, we would like to visualise the distribution of various age groups in a population. Normally, an ordinary bar chart would be able to visualise the data, but a pyramid chart would be a clearer depiction to show both sexes, male and female.
Before creating the age-sex pyramid, there were a few challenges faced.
In order to resolve the challenges:
Firstly, the two dataframes were joined together. Using the group_by() function, the data was able to be grouped by age, gender, time and planning area. Next, using the summarise() function, the number of residents was summed up in the Pop column for the groups. Lastly, the Time column was converted to integers using the as.integer() function.
As described in the challenges above, there were some planning areas with no residents. In order to remove such planning areas, the filter() function was used to filter out planning areas with a count of 0, and the select() function was used to extract out these planning areas into a list.
Lastly, the original dataset was filtered so that planning areas in the list were taken out of the dataset.
popcombine <- rbind(pop2010, pop2020)
uncleanpop <- popcombine %>%
group_by(`AG`, `Sex`, `Time`, `PA`) %>%
summarise('Count'= sum(`Pop`)) %>%
ungroup()
uncleanpop$Time = as.integer(uncleanpop$Time)
kable(head(uncleanpop))
| AG | Sex | Time | PA | Count |
|---|---|---|---|---|
| 0_to_4 | Females | 2000 | Ang Mo Kio | 4460 |
| 0_to_4 | Females | 2000 | Bedok | 7800 |
| 0_to_4 | Females | 2000 | Bishan | 2560 |
| 0_to_4 | Females | 2000 | Boon Lay/Pioneer | 0 |
| 0_to_4 | Females | 2000 | Bukit Batok | 4400 |
| 0_to_4 | Females | 2000 | Bukit Merah | 3240 |
| AG | Sex | Time | PA | Count |
|---|---|---|---|---|
| 0_to_4 | Females | 2000 | Ang Mo Kio | 4460 |
| 0_to_4 | Females | 2000 | Bedok | 7800 |
| 0_to_4 | Females | 2000 | Bishan | 2560 |
| 0_to_4 | Females | 2000 | Bukit Batok | 4400 |
| 0_to_4 | Females | 2000 | Bukit Merah | 3240 |
| 0_to_4 | Females | 2000 | Bukit Panjang | 3690 |
As per the previous Take-Home Exercise 1, the population of males were multiplied by -1 in order to flip the data on the x-axis and better depict the pyramid chart.
popdata$nCount = ifelse(popdata$Sex == "Males",
yes = -popdata$Count,
no = popdata$Count)
With the filtered data, an animated chart was plotted to show the total population over the years regardless of planning area.
anifig <-
ggplot(popdata, aes(x = nCount, y = AG, fill = Sex)) +
geom_col() +
scale_x_continuous(breaks = seq(-150000, 150000, 50000),
labels = paste0(as.character(c(seq(150, 0, -50), seq(50, 150, 50))),"k")) +
labs (x = "Population",
y = "Age Group",
title='Singapore Age-Sex Population Pyramid') +
theme_bw() +
theme(axis.ticks.y = element_blank())
anifig +
transition_time(Time) +
ease_aes('linear') +
labs (subtitle = 'Year: {frame_time}')
The animated chart was no doubt useful, but perhaps we can customise it further by including filters for Planning Area as well as Time.
In order to accomplish this, a for loop was used to extract out all unique values from the Planning Area column as well as the Time column.
Besides creating the dropdown lists, annotations are also created to give better clarity to the interactive charts displayed later.
year_list <- list()
for (i in 1:length(unique(popdata$Time))) {
year_list[[i]] <- list(method = "restyle",
args = list("transforms[0].value", unique(popdata$Time)[i]),
label = unique(popdata$Time)[i])
}
PA_list <- list()
for (j in 1:length(unique(popdata$PA))) {
PA_list[[j]] <- list(method = "restyle",
args = list("transforms[1].value", unique(popdata$PA)[j]),
label = unique(popdata$PA)[j])
}
annot <- list(list(text = "Select Planning Area:",
x = 1.61,
y = 0.63,
xref = 'paper',
yref = 'paper',
showarrow = FALSE))
Finally, the interactive chart was then generated using plot_ly(). A tooltip was added for greater clarity, and aesthetic features were adjusted using the layout, and updatemenus function.
fig <- plot_ly(popdata,
x = ~nCount,
y = ~AG,
type = 'bar',
orientation = 'h',
hovertemplate = ~paste("<br>Age Group:", AG,
"<br>Gender:", Sex,
"<br>Population:", Count),
color = ~Sex,
transforms = list(list(type = 'filter',
target = ~Time,
operation = '=',
value = unique(popdata$Time)[1]),
list(type = 'filter',
target = ~PA,
operation = '=',
value = unique(popdata$PA)[1]))
)%>%
layout(bargap = 0.1, barmode = 'overlay',
xaxis = list(title = "Population",
tickmode = 'array', tickvals = c(-10000, -8000, -6000, -4000, -2000, 0,
2000, 4000, 6000, 8000, 10000),
ticktext = c('10k', '8k', '6k', '4k', '2k', '0',
'2k', '4k', '6k', '8k', '10k')),
yaxis = list(title = "Age Group"),
title = 'Singapore Age-Sex Population Pyramid',
updatemenus = list(list(type = 'dropdown',
x = 1.6, y = 0.6,
buttons = PA_list)
),
sliders = list(list(
active = 1,
currentvalue = list(prefix = "Year: "),
pad = list(t = 60),
steps = year_list)),
annotations = annot, autosize = F)
fig
Tableau is a fantastic tool for pattern discovery using data visualisation, and it is definitely easy to use to create stunning visualisations. In contrast, R has a slightly steeper learning curve in terms of learning the language, but the benefits of R include access to countless libraries for data manipulation and data visualisation.
In this Take-Home Exercise, it is definitely easier for Tableau to generate both interactive and animated charts as compared to R with the inbuilt Filters and Pages function. However, R allows for a greater degree of customizability in terms of appearance and aesthetic features.